TASK-209191 block global trace while inheriting thread data#12
TASK-209191 block global trace while inheriting thread data#12balupillai wants to merge 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent the global tracing GC from running while a thread’s heap/object data is being inherited into another thread, by adding an explicit collector barrier around the inheritance operation.
Changes:
- Add per-OS-thread state lookup in
luaC_inherit_thread. - Block the collector before inheriting a thread’s heap contents and unblock afterward.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| GCheader *steal, *tmp; | ||
| thr_State *pt; | ||
|
|
||
| lua_lock(th); |
There was a problem hiding this comment.
Double-locking th causes deadlock with existing callers
High Severity
The newly added lua_lock(th) inside luaC_inherit_thread double-locks th because at least two call sites already hold that lock. In reclaim_object (lgc.c), the caller does lua_lock(th) before calling luaC_inherit_thread(L, th). In lua_delrefthread (lapi.c), lua_lock(L) is called on the source thread which becomes the th parameter. If lua_lock is a non-recursive mutex, this will deadlock.
Additional Locations (1)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } LUAI_TRY_END(L); | ||
| if (was_suspended) { | ||
| ck_pr_dec_32(&L->gch.ref); | ||
| } |
There was a problem hiding this comment.
Ref count decremented on failed resume while status unchanged
High Severity
When lua_resume is called on a suspended coroutine but fails with "C stack overflow," resume_error does not change L->status (it stays LUA_SUSPEND), yet ck_pr_dec_32(&L->gch.ref) still executes because was_suspended is true. On a subsequent retry of lua_resume, was_suspended is true again, causing another decrement — but the ref was only incremented once in lua_suspend. This ref count underflow can lead to premature GC collection of a still-live coroutine.


Note
Medium Risk
Touches low-level coroutine lifecycle and concurrent GC/heap transfer logic; incorrect refcounting or lock ordering could cause leaks, deadlocks, or crashes under concurrency.
Overview
Prevents garbage-collection races around thread suspension and reclaimed-thread heap transfer.
lua_suspendnow increments the coroutine GC refcount andlua_resumedecrements it when resuming fromLUA_SUSPEND, pinning suspended coroutines until they are resumed.luaC_inherit_threadnow takes the target thread lock and blocks the collector while moving objects/heaps, ensuring global trace cannot run concurrently with heap inheritance and avoiding lock contention/corruption during transfer.Written by Cursor Bugbot for commit 13ee876. This will update automatically on new commits. Configure here.